Effectiveness of Rich Document Representation in XML Retrieval
نویسندگان
چکیده
Information Retrieval (IR) systems are built with different goals in mind. Some IR systems target high precision that is to have more relevant documents on the first page of their results. Other systems may target high recall that is finding as many references as possible. In this paper we present a method of document representation called RDR to build XML retrieval engines with high specificity; that is finding more relevant documents that are mostly about the query topic. The Rich Document Representation (RDR) is a method of representing the content of a document with logical terms and statements. The conjecture is that since RDR is a better representation of the document content it will produce higher precision. In our implementation, we used the Vector Space model to compute the similarity between the XML elements and queries. Our experiments are conducted on INEX 2004 test collection. The results indicate that the use of richer features such as logical terms or statements for XML retrieval tends to produce more focused retrieval. Therefore it is a suitable document representation when users need only a few more specific references and are more interested in precision than recall.
منابع مشابه
Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica
Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملContext Driven XML Retrieval
This paper presents a data-centric approach to XML information retrieval which benefits from XML document structure and adapts traditional text-centric information retrieval techniques to deal with text content inside XML. We implement our ideas in a configurable, general purpose XML retrieval library which can be tuned to operate on multilingual XML resources with different structure and can b...
متن کاملProceedings of the SIGIR 2007 Workshop on Focused Retrieval
Determining the effectiveness of XML retrieval systems is crucial for improving information retrieval from XML document collections. Traditional effectiveness measures do not address the problem of overlap in the recall-base. At the Initiative for the Evaluation of XML retrieval (INEX), extended cumulated gain (XCG) was developed to address overlap. It works by comparing the cumulated score of ...
متن کامل